refactor(core): rename benchmark → project for registry + sync (1/4) by christso · Pull Request #1242 · EntityProcess/agentv

christso · 2026-05-14T22:48:29Z

Summary

PR 1 of 4 in the benchmark → project rename. Scope: internal @agentv/core symbols only. Wire formats (HTTP routes, JSON field keys, CLI flag descriptions, Studio routes/components, docs) are unchanged in this PR and will land in:

PR 2 — HTTP API routes + JSON field keys (`benchmark_id` → `project_id`, etc.) + CLI flag descriptions
PR 3 — Studio frontend (TanStack routes, components, hooks, types)
PR 4 — Docs + examples + skills cards

Renames

Before	After
`packages/core/src/benchmarks.ts`	`projects.ts`
`packages/core/src/benchmark-sync.ts`	`project-sync.ts`
`BenchmarkEntry` / `BenchmarkSource` / `BenchmarkRegistry`	`ProjectEntry` / `ProjectSource` / `ProjectRegistry`
`loadBenchmarkRegistry`, `saveBenchmarkRegistry`	`loadProjectRegistry`, `saveProjectRegistry`
`addBenchmark`, `removeBenchmark`, `getBenchmark`, `touchBenchmark`	`addProject`, `removeProject`, `getProject`, `touchProject`
`discoverBenchmarks`, `deriveBenchmarkId`, `getBenchmarksRegistryPath`	`discoverProjects`, `deriveProjectId`, `getProjectsRegistryPath`
`syncBenchmark`, `syncBenchmarks`	`syncProject`, `syncProjects`
`~/.agentv/benchmarks.yaml` (top-level `benchmarks:`)	`~/.agentv/projects.yaml` (top-level `projects:`)

One-time legacy file migration

`loadProjectRegistry()` calls `migrateLegacyBenchmarksFile()` before reading the registry. Four state transitions handled:

State	Behavior
Only `benchmarks.yaml` exists	Read → rewrite top-level key → write temp → `renameSync` to `projects.yaml` → `unlinkSync` old. One log line.
Only `projects.yaml` exists	No-op.
Both exist	`projects.yaml` wins. `stderr` warning. Legacy left in place for operator review.
Neither exists	No-op (fresh install).

The temp+rename pattern keeps `projects.yaml` from ever being half-written; the legacy file is only removed after the rename succeeds.

Why "project"

5 of 6 LLM observability tools (Phoenix, Langfuse, Braintrust, W&B Weave, LangSmith) use `project` for the container that holds eval runs, traces, and datasets. agentv is adding trace/span/latency capture alongside eval runs, making "benchmark" too narrow. The rename also disambiguates from the academic "benchmark = eval suite" usage that's retained in example directory names (`benchmark-tooling`, `multi-model-benchmark`, etc.) — those genuinely are benchmark suites and stay named that way.

Test plan

`bun run typecheck` — passes
`bun run lint` — clean
`bun run test` — 2374 tests pass (1768 core including 4 new migration tests + 67 eval + 539 cli, 0 fail)
`bun run build` — all packages build
Pre-push hooks pass (including `validate:examples` over 56 example evals)
Red/green UAT with a real simulated home dir at `/tmp/uat-rename-home/`:

```
$ ls /tmp/uat-rename-home/.agentv/
benchmarks.yaml # legacy fixture with 2 entries (alpha + beta with source)

$ HOME=/tmp/uat-rename-home bun -e "import { loadProjectRegistry } from '@agentv/core';
const r = loadProjectRegistry(); console.log(r.projects.map(p => p.id));"
[agentv] Migrated registry: benchmarks.yaml → projects.yaml (2 entries)
[ "alpha", "beta" ]

$ ls /tmp/uat-rename-home/.agentv/
projects.yaml # legacy file gone, content preserved including .source

Second load → silent no-op

$ HOME=/tmp/uat-rename-home bun -e "...loadProjectRegistry()..."
[ "alpha", "beta" ] # no migration log line — idempotent

Both-files conflict → projects.yaml wins, warning emitted

$ # (re-create benchmarks.yaml alongside the new projects.yaml)
$ HOME=/tmp/uat-rename-home bun -e "...loadProjectRegistry()..."
[agentv] Both .../.agentv/benchmarks.yaml and .../.agentv/projects.yaml exist.
Using projects.yaml; delete benchmarks.yaml when you've confirmed the new file is correct.
[ "alpha", "beta" ]
```

The 4 new migration tests in `packages/core/test/projects.test.ts` cover the same three transitions plus the fresh-install no-op.

Notes on what's intentionally NOT renamed in this PR

HTTP routes like `/api/benchmarks/...` — PR 2.
Wire field names `benchmark_id`, `benchmark_name` in API responses — PR 2.
Studio route `$benchmarkId` URL param and TanStack route files — PR 3.
The private `withBenchmark()` middleware in `serve.ts` — PR 2 (paired with route rename).
The "Multi-benchmark mode" console message and CLI flag help text — PR 2.
Example directories named `*-benchmark` — they're genuinely benchmark suites in the academic sense; stays as-is by design.
`benchmark.json` per-run metrics artifact (Agent Skills compatibility) — a different concept; separate cleanup, deferrable.

🤖 Generated with Claude Code

Internal-only rename (PR 1 of 4). The user-facing "benchmark" terminology in HTTP routes (/api/benchmarks/...), JSON field names (benchmark_id, benchmark_name), CLI flags, Studio components, and docs is unchanged in this PR — those land in PR 2 (HTTP API), PR 3 (Studio frontend), and PR 4 (docs). Renamed: - packages/core/src/benchmarks.ts → projects.ts - packages/core/src/benchmark-sync.ts → project-sync.ts - BenchmarkEntry → ProjectEntry, BenchmarkSource → ProjectSource, BenchmarkRegistry → ProjectRegistry - loadBenchmarkRegistry → loadProjectRegistry, saveBenchmarkRegistry → saveProjectRegistry, addBenchmark → addProject, removeBenchmark → removeProject, getBenchmark → getProject, touchBenchmark → touchProject, discoverBenchmarks → discoverProjects, deriveBenchmarkId → deriveProjectId, getBenchmarksRegistryPath → getProjectsRegistryPath, syncBenchmark → syncProject, syncBenchmarks → syncProjects - ~/.agentv/benchmarks.yaml → projects.yaml, top-level key `benchmarks:` → `projects:` One-time migration: - loadProjectRegistry() calls migrateLegacyBenchmarksFile() before reading the registry. If only benchmarks.yaml exists, it is read, transformed (top-level key rewritten), written to a temp file, atomically renamed to projects.yaml, and the legacy file is unlinked. If both files exist, projects.yaml wins and a warning is logged. Idempotent: subsequent loads are a no-op. Rationale: 5 of 6 LLM observability tools (Phoenix, Langfuse, Braintrust, W&B Weave, LangSmith) use "project" for the container that holds eval runs, traces, datasets, and other telemetry. agentv is adding trace/span/latency capture alongside eval runs, making "benchmark" too narrow. The rename also disambiguates from the academic "benchmark = eval suite" usage that survives in example directory names (benchmark-tooling, multi-model-benchmark, etc.). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-05-14T22:48:46Z

Deploying agentv with Cloudflare Pages

Latest commit:	`5d84e67`
Status:	✅ Deploy successful!
Preview URL:	https://68177b2b.agentv.pages.dev
Branch Preview URL:	https://refactor-rename-benchmark-to.agentv.pages.dev

View logs

This was referenced May 14, 2026

refactor(api): rename HTTP routes, JSON keys, CLI messages → project (2/4) #1243

Merged

refactor(docs): rename docs/skills → project, retain academic uses (4/4) #1245

Merged

christso marked this pull request as ready for review May 15, 2026 00:29

christso merged commit 66ffa92 into main May 15, 2026
4 checks passed

christso deleted the refactor/rename-benchmark-to-project branch May 15, 2026 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(core): rename benchmark → project for registry + sync (1/4)#1242

refactor(core): rename benchmark → project for registry + sync (1/4)#1242
christso merged 1 commit into
mainfrom
refactor/rename-benchmark-to-project

christso commented May 14, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented May 14, 2026

Summary

Renames

One-time legacy file migration

Why "project"

Test plan

Second load → silent no-op

Both-files conflict → projects.yaml wins, warning emitted

Notes on what's intentionally NOT renamed in this PR

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant